Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
translated by 谷歌翻译
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to develop and integrate a pseudo-pilot agent into the ATCo training pipeline by merging diverse artificial intelligence (AI) powered modules. The system understands the voice communications issued by the ATCo, and, in turn, it generates a spoken prompt that follows the pilot's phraseology to the initial communication. Our system mainly relies on open-source AI tools and air traffic control (ATC) databases, thus, proving its simplicity and ease of replicability. The overall pipeline is composed of the following: (1) a submodule that receives and pre-processes the input stream of raw audio, (2) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (3) a high-level ATC-related entity parser, which extracts relevant information from the communication, i.e., callsigns and commands, and finally, (4) a speech synthesizer submodule that generates responses based on the high-level ATC entities previously extracted. Overall, we show that this system could pave the way toward developing a real proof-of-concept pseudo-pilot system. Hence, speeding up the training of ATCos while drastically reducing its overall cost.
translated by 谷歌翻译
This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.
translated by 谷歌翻译
We present a conceptually simple and intuitive method to calculate and to measure the dissimilarities among 2D shapes. Several methods to interpret and to visualize the resulting dissimilarity matrix are presented and compared.
translated by 谷歌翻译
NLP系统的解释性方法遇到了因果推断的基本问题的版本:对于给定的基础真相输入文本,我们从未真正观察到隔离模型表示对输出的因果影响所必需的反事实文本。作为回应,许多解释性方法不使用反事实文本,假设它们将是不可用的。在本文中,我们表明可以使用近似反事实来创建强大的因果解释方法,该方法可以由人类写成近似特定的反事实或简单地使用元数据指导的启发式启发式启示术进行采样。我们提案的核心是因果替代模型(CPM)。 CPM解释了一个黑框$ \ Mathcal {n} $,因为它经过培训可以具有与$ \ Mathcal {n} $相同的实际输入/输出行为,而创建可以介入的神经表示,以模拟反事实输入/$ \ MATHCAL {N} $的输出行为。此外,我们证明了$ \ Mathcal {n} $的最佳CPM在做出事实预测时性能与$ \ Mathcal {n} $相当地执行,这意味着CPM可以简单地替换$ \ Mathcal {n} $,从而导致更多信息可解释的部署模型。我们的代码可在https://github.com/frankaging/causal-proxy-model上找到。
translated by 谷歌翻译
机器学习和计算机视觉是动态增长的领域,事实证明,它们能够解决非常复杂的任务。它们也可以用于监测蜜蜂菌落和检查其健康状态,在这种情况至关重要之前,可以确定潜在的危险状态,或者更好地计划定期的蜜蜂殖民地检查,从而节省大量费用。在本文中,我们介绍了用于蜜蜂监视的最先进的计算机视觉和机器学习应用程序。我们还证明了这些方法的潜力,作为自动蜜蜂计数器算法的一个例子。该论文针对的是兽医和养育专业人士和专家,他们可能不熟悉机器学习来向他们介绍其可能性,因此,每个应用程序都通过与基本方法相关的简短理论介绍和动机来打开。我们希望本文能够激发其他科学家将机器学习技术用于蜜蜂监测中的其他应用。
translated by 谷歌翻译
自动伪标记是一种强大的工具,可以利用大量的连续未标记数据。在绩效要求非常大,数据集和手动标记的自动驾驶的关键安全应用中,它特别有吸引力。我们建议利用捕获的顺序性,通过培训多个教师在教师的设置中提高伪标记技术,每个教师都可以访问不同的时间信息。这套被称为一致性的教师比标准方法为学生培训提供了更高质量的伪标签。多个教师的输出通过新颖的伪标记信心引导的标准组合。我们的实验评估集中在城市驾驶场景中的3D点云域。我们显示了我们的方法的性能,应用于多个模型体系结构,其中包含3D语义分割任务和两个基准数据集上的3D对象检测。我们的方法仅使用20%的手动标签,优于某些完全监督的方法。对于培训数据,例如自行车和行人,很少出现在培训数据中的课程方面的特殊表现提升。我们的方法的实现可在https://github.com/ctu-vras/t-concord3d上公开获得。
translated by 谷歌翻译
陆地植物的多样性在维持稳定,健康和生产的生态系统方面起着关键作用。尽管遥感被认为是估计植物多样性的有前途且具有成本效益的代理,但缺乏关于如何从Spaceborne Hyperfectral数据中推断出植物多样性的定量研究。在这项研究中,我们评估了通过DLR接地传感成像光谱仪(DESIS)捕获的高光谱数据的能力,以估计澳大利亚东南部南部梯田和雪山地区的植物物种丰富度。首先通过主成分分析,规范相关分析和部分最小二乘分析从Desis光谱中提取光谱特征。然后在提取的特征和植物物种丰富度之间进行了回归,并具有普通的最小二乘回归,内核脊回归和高斯工艺回归。根据两倍的交叉验证方案,使用相关系数($ r $)和根平方错误(RMSE)评估结果。凭借最佳性能的模型,$ r $为0.71,而南部塔林群岛地区的RMSE为5.99,而$ R $为0.62,而雪山地区的RMSE为6.20。这项研究中报道的评估结果为未来的研究提供了支持,了解太空传播高光谱测量与陆地植物生物多样性之间的关系。
translated by 谷歌翻译
自动生物医学图像分析的领域至关重要地取决于算法验证的可靠和有意义的性能指标。但是,当前的度量使用通常是不明智的,并且不能反映基本的域名。在这里,我们提出了一个全面的框架,该框架指导研究人员以问题意识的方式选择绩效指标。具体而言,我们专注于生物医学图像分析问题,这些问题可以解释为图像,对象或像素级别的分类任务。该框架首先编译域兴趣 - 目标结构 - ,数据集和算法与输出问题相关的属性的属性与问题指纹相关,同时还将其映射到适当的问题类别,即图像级分类,语义分段,实例,实例细分或对象检测。然后,它指导用户选择和应用一组适当的验证指标的过程,同时使他们意识到与个人选择相关的潜在陷阱。在本文中,我们描述了指标重新加载推荐框架的当前状态,目的是从图像分析社区获得建设性的反馈。当前版本是在由60多个图像分析专家的国际联盟中开发的,将在社区驱动的优化之后公开作为用户友好的工具包提供。
translated by 谷歌翻译
尽管自动图像分析的重要性不断增加,但最近的元研究揭示了有关算法验证的主要缺陷。性能指标对于使用的自动算法的有意义,客观和透明的性能评估和验证尤其是关键,但是在使用特定的指标进行给定的图像分析任务时,对实际陷阱的关注相对较少。这些通常与(1)无视固有的度量属性,例如在存在类不平衡或小目标结构的情况下的行为,(2)无视固有的数据集属性,例如测试的非独立性案例和(3)无视指标应反映的实际生物医学领域的兴趣。该动态文档的目的是说明图像分析领域通常应用的性能指标的重要局限性。在这种情况下,它重点介绍了可以用作图像级分类,语义分割,实例分割或对象检测任务的生物医学图像分析问题。当前版本是基于由全球60多家机构的国际图像分析专家进行的关于指标的Delphi流程。
translated by 谷歌翻译